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ORIGINAL ARTICLE 

Towards a molecular characterization of autism spectrum 
disorders: an exome sequencing and systems approach 

JY An 1 , AS Cristino 1 , Q Zhao 1 , J Edson 1 , SM Williams 1 , D Ravine 2 , J Wray 3 , VM Marshall 1 , A Hunt 4 , AJO Whitehouse 4 ' 5 and 
C Claudianos 1,5 

The hypothetical 'AXAS' gene network model that profiles functional patterns of heterogeneous DNA variants overrepresented in 
autism spectrum disorder (ASD), X-linked intellectual disability, attention deficit and hyperactivity disorder and schizophrenia was 
used in this current study to analyze whole exome sequencing data from an Australian ASD cohort. An optimized DNA variant 
filtering pipeline was used to identify loss-of-f unction DNA variations. Inherited variants from parents with a broader autism 
phenotype and de novo variants were found to be significantly associated with ASD. Gene ontology analysis revealed that putative 
rare causal variants cluster in key neurobiological processes and are overrepresented in functions involving neuronal development, 
signal transduction and synapse development including the neurexin trans-synaptic complex. We also show how a complex gene 
network model can be used to fine map combinations of inherited and de novo variations in families with ASD that converge in the 
L1CAM pathway. Our results provide an important step forward in the molecular characterization of ASD with potential for 
developing a tool to analyze the pathogenesis of individual affected families. 
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INTRODUCTION 

Autism spectrum disorder (ASD) is a highly heritable neurodeve- 
lopmental disorder that is clinically characterized by impaired 
social interaction and communication deficits, as well as restricted 
and repetitive behavior. 1 ASD is generally first apparent in 
childhood and affects up to -1-2% of the population. 2,3 The 
emergence of massively parallel DNA sequencing has identified 
many rare single-nucleotide variants (SNVs) and small insertion/ 
deletion (indels) variations associated with ASD. Recently, whole 
exome sequencing (WES) has verified the contribution of de novo 
variants (DNVs) in ASD, including an increased rate of DNVs 
associated with aged paternity 4 " 7 WES studies have similarly 
clarified the role of rare inherited variants (IVs), such as rare gene 
deletion (inherited loss-of-function homozygous, compound 
heterozygous or X-chromosome variants in males), 8 recessive 
homozygous, 9 bi-allelic variants 10 and coinheritance of variants 
from multiplex families. 11 More recently, whole genome sequen- 
cing has identified rare genetic variants associated with noncod- 
ing DNA and also variants affecting the differential splicing of 
genes. 12,13 These discoveries have led to the insight that hundreds 
of rare genetic variants can influence numerous biological 
functions associated with ASD. 14 Many of these variations occur 
in genes that are comorbid for ASD and other symptom 
complexes (for example, specific language impairments, intellec- 
tual disabilities, epilepsies, schizophrenia) and monogenic syn- 
dromes (for example, Rett syndrome, fragile X syndrome, tuberous 
sclerosis, Timothy syndromes). 3 

Notwithstanding these conceptual advances, large gaps remain 
in our understanding of the genetic basis of ASD. Genetic 
heterogeneity adds to the challenge of understanding the relative 



contributions of numerous rare genetic variants to the severity of 
ASD. 15-17 The influence of differing combinations of rare ASD- 
associated variants on the occurrence of ASD is poorly 
understood. 18 In spite of these problems, attempts have been 
made to reconcile the functional relevance of case-specific rare 
variants. However, these studies are few in number and 
reductionist approaches have struggled to accommodate poten- 
tially hundreds of DNA variants associated with ASD. Single gene/ 
variant functional studies fail to account for the likely combina- 
tions of weak alleles contributing to the cumulative liability 
threshold for ASD. Similarly, the genetic tool-box of model species 
is not accessible for analysis of the relative contribution of 
differing combinations of rare heterozygous human DNA variants. 
In the face of this scientific challenge, the field of molecular 
psychiatric genetics is trying to build a paradigm to grasp the 
'bigger picture' concerning the molecular and biological basis of 
ASD. One such way forward is the use of systems approaches. 19-21 
We recently reported a hypothetical gene network model that 
was used to profile the functional patterns of heterogeneous DNA 
variants overrepresented in ASD, X-linked intellectual disorder, 
attention deficit and hyperactivity disorder and schizophrenia. 1 In 
the current study, we have applied the AXAS model to analyze 
WES data from an Australian ASD cohort. We show how 
combinations of loss-of-function variants cluster in functional 
processes and putatively contribute to the causal profile of 
individuals with an ASD. We also show how DNA variants inherited 
from parents with a broader autism phenotype (BAP) 22 and DNVs 
have a significant association with ASD. 
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MATERIALS AND METHODS 

Recruitment and behavioral assessment 

Participants were drawn from the Western Australian Autism Biological 
Registry, at the Telethon Institute for Child Health Research in Perth, 
Western Australia. Participants were recruited via advertisement and 
children with a DSM-IV-based 23 clinical diagnosis of autistic disorder, 
Asperger's syndrome or pervasive developmental disorder-not otherwise 
specified were included in the study. Diagnosis of ASD was obtained by 
consensus following a multidisciplinary assessment by a team comprising 
a pediatrician, clinical psychologist and speech pathologist. 24 

Forty families from the biological registry, who had either one child or 
multiple children (trio and sibling families) diagnosed with an ASD, were 
randomly selected for the current study. The total sample comprised 48 
cases and 80 parent controls (Supplementary Figure 1). All probands were 
administered the autism diagnostic observation schedule-generic, 25 and 
were found to meet the criteria for ASD. Parents were asked to complete 
the autism-spectrum quotient, a self-report questionnaire that provides a 
quantitative measure of autistic-like traits in the general population 26 and 
a score of 22 or above on the autism-spectrum quotient was indicative of 
the BAP 27 — subclinical expression of behaviors characteristic of ASD. 

The final participant sample comprised 35 simplex families and 5 
multiplex families: 37 families were of European origin and 3 families were 
of Asian origin (Supplementary Figure 1; Supplementary Table 2). Among 
the 48 probands, there were 43 children with a diagnosis of autistic 
disorder, 2 with Asperger's disorder and 3 with pervasive developmental 
disorder-not otherwise specified. Among the 80 parents (controls), there 
were 23 parents (6 mothers and 17 fathers) classified as BAP, 55 parents 
(34 mothers and 21 fathers) classified as non-BAP (Supplementary Table 2) 
and 2 fathers who did not complete the BAP assessment. 

Exome capture sequencing 

Blood was obtained via venipuncture from all 142 participants and 
genomic DNA was extracted from whole blood. We used 1 ug of genomic 
DNA to construct whole genome libraries. Multiplexed genomic libraries of 
four different individuals were subjected to whole exome capture using 
NimbleGen SeqCap version 3. Exome libraries were sequenced with paired- 
end 100 bp reads on the lllumina HiSeq2000 (San Diego, CA, USA) using 
lllumina SBS V3 chemistry. This approach captures 62 Mb of the genomic 
sequence comprising exonic variants (variants in protein-coding portion), 
ncRNA variants (variants in noncoding RNAs), untranslated region (UTR) or 
intronic variants (variants in UTRs and intron regions) and intergenic 
variants (variants between mRNA transcripts of two genes). 

Variant identification 

Raw sequencing data were mapped by Burrows-Wheeler Aligner 28 on the 
hg19 reference genome from 1000 Genome Project (human_g1 k_v37) 
followed by removal of PCR duplicates using Picard software (http://picard. 
sourceforge.net). Realignment of the mapped reads was performed using 
the Genome Analysis Toolkit (version 2.6-4-g3e5ff60) 29 For variant calling, 
we used the Genome Analysis Toolkit UnifiedGenotyper to call raw 
genotypes from sequenced regions that contain single-nucleotide poly- 
morphisms (SNPs) and small indels. To achieve a high-quality call set, we 
used multisample calling and Variant Quality Score Recalibrator with 
training data sets (HapMap3, 1k genome and dbSNP). 30,31 We discarded 
genotypes that were likely to be false positives or of poor quality from the 
list of variants with the following criteria: (1) heterozygous genotypes in 
the X-chromosome in males, (2) genotypes in the Y chromosome in 
females, (3) genotypes covered by fewer than 20 reads, (4) genotypes with 
a Phred-scaled base quality score lower than 30 and (5) genotypes with 
indels that are frequently observed across multiple samples. 

We assessed SNPs and indels using ANNOVAR software. 32 We annotated 
variants using the hg19 reference genome and prioritized these by their 
minor allele frequency using the 1000 Genomes Project (the April 2012 
release for European and Asian populations) and the NHLBI-6500 Exomes 
(this version includes sex chromosomes and indels for all ethnicity groups). 
We determined the functional significance of variants using scale-invariant 
feature transform (SIFT) and PolyPhen2 (the LJB version 2) for missense 
mutations 33 and used the SIFT Indel tool (http://sift.bii.a-star.edu.sg/www/ 
SIFT_indels2.html) for frameshift indels. 34 Finally, we excluded genomic 
regions that are likely to contribute false-positive signals in variant calling 
of exome sequencing 35 



De novo and inherited variant selection 

We used the phase-by-transmission, a Genome Analysis Toolkit module, to 
identify DNVs and IVs from given familial information. The most likely 
genotype was estimated by computing the posterior probability for all 
possible genotypes in each family from the raw genotype likelihoods and 
expected prior probability of transmission (P< 0.001). For DNVs, we chose 
de novo calls with ^ 20 Transmission Probability (a Phred-scaled probability 
of erroneous call ^ 1%) and ^ 20 read depth. We further reduced de novo 
calls in cases where the depth in number of sequence reads of alternative 
allele is greater than 10% in the maternal or paternal genotype. We then 
randomly selected 10 DNVs and successfully validated their genotype by 
targeted Sanger sequencing using custom primers (Supplementary Figure 
2). For IVs, we only considered genotypes that were successfully phased in 
the parents. 

MicroRNA gene and target site variants 

Target site prediction from the 3'-UTR of gene transcripts found in UCSC 
'knownGene' annotation on hg19 reference genome were conducted for 
all microRNAs (miRNAs) found in miRBase Release 19. 36 Predicted sites 
were then mapped back onto the genome, and compared with the refSeq 
gene annotation. Two miRNA target site predictions tools miRanda 
(v3.3a) 37 and RNAhybrid (v2.1.1) 38 were used to confirm target sites with 
a minimum free energy less than -18 and parameters optimized for 
human S'-UTRs (options: -b 2000, -e -18, -s 3utr_human). MiRanda 
was run with default parameters. The predicted miRNA target sites were 
then filtered using a free energy of -25 or less for RNAhybrid and - 18 for 
miRanda. The sites used in this analysis are those predicted by miRanda, 
with a 75% reciprocal overlap with RNAhybrid predictions. Bedtools was 
used to find SNVs that were located in predicted miRNA target regions. 39 

Protein-protein interaction network 

We utilized the protein-protein interaction (PPI) network based on the 
AXAS model, 1 for variant-classification and systems biology analyses. We 
retrieved all possible interactions between primary candidate genes and 
their first-order interactors from the whole PPI network. The PPI network 
visualization and functional analysis was performed as previously 
reported 1 using Cytoscape. 40 We examined nonredundant DNA variants 
found in our sequence analysis using a binomial distribution with a 
standardized Z-score to test whether sets of genes are overrepresented in 
the AXAS-PPI network of four different mental health disorders; ASD, 
X-linked intellectual disorder, attention deficit and hyperactivity disorder 
and schizophrenia. 1 

Functional ontology analysis 

We used ClueGo version 1.8 (a Cytoscape plug-in), 41 to perform functional 
enrichment analysis based on annotations of three functional ontology 
databases: gene ontology (GO) (http://geneontology.org/), Kyoto encyclo- 
pedia of genes and genomes (KEGG) pathways (http://www.genome.jp/ 
kegg/) and Reactome (http://www.reactome.org/). The statistical signifi- 
cance of functional terms was calculated with the Fisher's Exact Test and 
adjusted using Bonferroni step-down correction. We only used GO terms 
that are annotated based on experimental evidence in the biological 
process ontology, and excluded all KEGG disease pathways. 



RESULTS 

Exome sequencing 

Exome sequencing was performed on ASD families 
(Supplementary Figure 3), with a 57.9 x average sequence read 
depth and 71.5% capture efficiency across the target regions 
(Supplementary Table 2). Approximately 50 Mb of the exome was 
covered by ^20x reads from which we confidently identified 
DNA variants. There was an average of 38 686 SNPs and 2420 
indels per individual exome. Of these, 13 919 SNPs and 155 indels 
were located in protein-coding sequences, and 24 767 SNPs and 
2265 indels found in noncoding sequences (Figure 1a). Most of 
the SNPs were identified in exonic (13 919; 36.0%) and intronic 
regions (14 087; 36.4%), whereas indels were mostly found in 
intronic regions (1382; 57.1%) and 3'-UTRs (453; 18.7%). Among 
variants in noncoding sequences, there were 1773 SNPs and 103 
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Figure 1. Genome distribution of DNA variants. Circle plots show the type and localization of filtered DNA variants detected by exome 
sequence analysis. The location of variants was annotated using the Reference Sequence hg19 database with ANNOVAR software, (a) SNPs 
were equally distributed in both intronic and exonic regions, whereas most indels were observed in intronic and 3'-UTR regions of gene 
transcripts, (b) DNA variants were also found to occur in noncoding RNA sequences. SNP, single-nucleotide polymorphism; UTR, untranslated 
region. 



indels in regions that encode a miRNA or long noncoding RNA 
(Figure 1b). There were a total of 9650 SNVs that occur within 
predicted miRNA target sites of 3'-UTRs (from 2 624 853 predicted 
sites in all). 

Regarding Mendelian inheritance, most of the variants were 
inherited from parents but 44 variants newly occurred (c/e novo) in 
28 ASD children. Among 44 DNVs, there were 29 associated with 
protein-coding sequences, 9 were intronic, 4 were intergenic and 
2 occur in 3'-UTR regions of genes. Aside from variants associated 
with noncoding sequences, we selected variants that result in 
changes to protein-coding sequences (missense, nonsense and 
frameshift variants) for analysis. Although coding variants can 
more obviously be associated with loss of function, we are mindful 
that noncoding variants detected using exome sequencing offer 
important insights into the aberrant expression of coding genes 
and noncoding RNAs (Figure 1b). 

Putative causal variants 

Many previous exome studies have used public databases for 
prioritizing putative causal variants from raw variants based on 
information of allele frequency or a functional prediction score. 
For example, databases of the 1000 Genomes Project and 
NHLBI-6500si Exomes were used for measuring minor allele 
frequency (MAF) of variants, and those of SIFT and PolyPhen2 
were used for predicting a functional impact of an amino acid 
substitution caused by variants. Typically, exome studies select 
putative causal variants that are rare in a population (those 
variants with < 5% MAF or unknown in the 1000 Genomes Project 
and NHLBI-6500si Exomes) or alter a protein function (those 
variants classified as 'damaging' in SIFT and PolyPhen2 
predictions). 11 ' 42-45 

We tested how the filtering process affects the prioritization of 
causal variants (Supplementary Figure 4) using various thresholds 
and exclusion parameters with the AXAS-PPI model which 
assumes that there is a set of genes significantly associated with 
ASD. We retrieved genes for which variants occurred in cases or 



controls (parents) to test whether these genes are overrepre- 
sented in the ASD-PPI network model. 1 We found that by lowering 
the MAF threshold we were able to more stringently filter DNA 
variants on the basis of allele frequency. The differences in 
association of variants with ASD between cases and controls 
became smaller when using a < 2% MAF setting (with the 1000 
Genomes Project data set; Supplementary Figure 4a) and < 1% 
MAF (with the NHLBI-6500si Exomes data set; Supplementary 
Figure 4b), respectively. When examining the SIFT variant filtering 
process, the association of variants from cases was higher than 
from controls, within the range of thresholds for the prediction of 
the damaging classification (0^ SIFT score ^0.05; Supplementary 
Figure 4c). The association of variants from parent controls 
decreased with a SIFT score of 0.1, the threshold associated with a 
damaging classification. When using PolyPhen2, the association of 
variants from cases became higher when increasing the cutoff 
within the range of thresholds for the prediction of the damaging 
classification (0.447^ PolyPhen2 score < 1; Supplementary Figure 
4d), whereas association of variants from controls did not change 
much within this range. Therefore, rare variants with low allele 
frequency and significant functional impact as determined by the 
AXAS model are most likely to be associated with ASD. 

On the basis of these observations, we defined causal variants 
as those meeting the following criteria: (1) variants annotated as a 
missense/nonsense and frameshift indel, (2) variants that have 
minor allele frequency (MAF) < 0.01 or are not reported in the 
1000 Genomes Project and NHLBI-6500si Exomes, (3) missense/ 
nonsense variants classified as damaging by both SIFT (0^ 
prediction score ^0.05) and PolyPhen2 (0.447^ prediction score 
^1) and (4) frameshift variants classified as damaging by SIFT 
Indel tool. Next, we conducted the process of variant filtering and 
reduced the number of variants by subsequent combinations of 
variant databases (Figure 2a). On the basis of this stringent variant 
filtering pipeline, combining four variant databases with appro- 
priate thresholds, we were able to successfully predict rare DNA 
variants in ASD cases. These putative causal variants were shown 
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Figure 2. Overview of variant filtering pipeline, (a) A detailed diagram of the computational pipeline and databases used to filter DNA variants 
found in WES data. Shown are the average number of variants per individual, (b) Comparison of databases showing the association of filtered 
variants with ASD. The level of association was measured using a binomial distribution with a standardized Z-score as determined by the 
AXAS model. 1 ADHD, attention deficit and hyperactivity disorder; ASD, autism spectrum disorder; MAF, minor allele frequency; PPI, 
protein-protein interaction; SZ, schizophrenia; WES, whole exome sequencing; XLID, X-linked intellectual disorder. 



to be significantly associated with ASD but not with other 
neurodevelopmental or neuropsychiatric disorders (Figure 2b). 

Variant classification 

Our exome analysis identified DNA variations in 1754 and 2607 
candidate genes in cases and parent controls, respectively 
(Supplementary Table 4). As described above, we used the 
AXAS-PPI network as a biological assumption set to test the 
association of these genes with ASD. We found that variations in 
candidate genes of ASD children were significantly associated 
with the ASD-PPI network (P = 0.04) (Figure 3a). Notably, variants 
of candidate genes found in parent controls were not significantly 
associated with ASD. 

Although putative causal variants in control parents did not 
reach statistical significance, these variants were tightly associated 
with ASD and a higher Z-score for ASD compared with other 
disorders (Figure 3a). Therefore, we assessed prospective causal 
variants by examining their association with BAP and non-BAP 
parents. We found that 1053 and 2174 candidate genes contain- 
ing putative causal variants occurred in 23 parents with BAP and 
55 parents without BAP, respectively (Supplementary Table 4). The 
causal variants associated with candidate genes in parents with 
BAP were shown to be significantly associated with ASD (P = 0.02), 
whereas those in parents without BAP were not (Figures 3d and 
3f). Importantly, this suggests that parents with weaker or milder 
behavioral deficits confer significant ASD risk as determined by 
our variant-classification system. 



Mendelian inheritance of ASD risk 

We further examined whether Mendelian inheritance confers a 
risk for ASD by comparing causal variants inherited from parents 
with BAP (IVs-BAP) and those from parents without BAP (IVs non- 
BAP). We found that a total of 764 and 1271 candidate genes that 
contained causal variants occur in these groups, respectively 
(Supplementary Table 4). Variant classification showed that IVs- 
BAP are significantly associated with the ASD-PPI network 
(P = 0.002), whereas IVs from non-BAP parents are not (Figures 
3e and 3g). This finding was consistently observed when causal 
variants were separated using the weak ASD phenotype status of 
the parents to measure the basis of ASD inheritance 
(Supplementary Table 1). Therefore ASD-associated variants found 
in parents with BAP are more likely to confer ASD risk to their 
children. 

Our results also revealed that candidate genes containing DNVs 
from ASD cases have significant association with the ASD-PPI 
network (P = 0.01). This is consistent with recent findings that 
DNVs are important genetic factors that contribute to sporadic 
ASD cases. 46 We also examined the possible relationship of 
paternal age with frequency of DNVs. We found that there was a 
significant positive correlation between DNVs and the age of the 
father, confirming recent reports 4-7 of an increased risk of ASD 
with paternal age (Supplementary Figure 5). There is, however, a 
possibility that some filtered DNVs may arise somatically in 
immune cell lineages found in blood. Future targeted re- 
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Figure 3. Variant classification using the AXAS model. Radar plots showing binomial distributions with standardized Z-scores used to examine 
the association of putative causal DNA variants with the AXAS-PPI model. 1 Z-scores ^1.65 show significant association with a 
neurodevelopmental disorder. Inherited variants from BAP parents (g) and de novo variants (c) show the greatest association with ASD; 
see also Supplementary Table SI. ADHD, attention deficit and hyperactivity disorder; ASD, autism spectrum disorder; BAP, broader autism 
phenotype; SZ, schizophrenia; XLID, X-linked intellectual disorder. 



sequencing of DNA from a second (nonimmune) cell type would 
comprehensively verify parental origin of DNVs. 

Convergent pathway in ASD cases 

Although there were many putative ASD variants identified in 
probands, it still remains unclear how these variants contribute to 
biological/behavioral phenotypes associated with ASD. Examining 
these variants in control parents with BAP does, however, provide 
an insight as to how these are likely to combine in the next 
generation and contribute to the presentation of an ASD. We 
postulated that causal variants of different origins, inherited and 
de novo, converge in molecular pathways and process so as to 
contribute to a critical threshold of liability that results in an ASD. 
To test this hypothesis, we examined inherited and de novo causal 
variants from selected ASD cases, and constructed the PPI network 
of these families and probands. We used functional ontology 



analysis for these PPI networks to search for convergent pathways 
that are enriched in the ASD-PPI network. Fine mapping 
maternally and paternally inherited causal variants and DNVs in 
the context of their molecular interactions provides a precise view 
of how biological pathways are likely to be functionally 
compromised (Figure 4). 

We found that there were a total of 1 16 GO, 33 KEGG and 94 
Reactome terms identified in convergent variant pathway analysis 
of the 48 ASD cases (Supplementary Table 5). Of note, there was 
an increased frequency of affected pathways associated with 
synaptic development and neurodevelopment (Table 1). Clearly, 
incorrect synapse development and erosion of synaptic function 
due to genetic variants are widely considered to be key 
contributors to ASD. 47,48 These pathways encompass long-term 
potentiation (KEGG:04720), trafficking of AMPA receptors (REAC- 
TOME:18307) and activation of NMDA receptors upon glutamate 
binding and postsynaptic events (REACTOME:20563). These data 
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Figure 4. Clustering analysis in ASD cases. PPI networks were constructed using putative causal variants and their first-degree protein 
interactors that result in three types of affected networks. Two possible schemes (a and b) show how maternally inherited variants (yellow), 
paternally inherited variants (green) and de novo variants (blue) converge and cluster in unique functional pathways (gray). Functional 
ontology of three proband cases: (c) 06.s1, neuronal migration; (d) 30.s1, long-term potentiation, (e) 38.s1, axon guidance and inherited 
variants that merge with de novo variants that converge in (f) 18.s1, ankyrin and adhesive related proteins involved the synapse development 
and neurodevelopment. Of the 48 ASD cases, there were four unique combinations (pedigrees) of variants that cluster in the L1CAM 
interaction pathway (REACTOME:22205; see Supplementary Table 5 for details of gene names). ASD, autism spectrum disorder; PPI, 
protein-protein interaction. 



Table 1. Pathways relevant to ASD pathophysiology 



Pathway ID 



Pathway name 



Case ID 



GO:0001764 Neuron migration 04.s1, 06.s1 

GO:0035176 Social behavior 36.s2 

GO:0071625 Vocalization behavior 36.s2 

KEGG:04720 Long-term potentiation 11.s1,19.s1, 

22.S1 

KEGG:04722 Neurotrophin signaling pathway 13.s1, 37.s3 

REACTOME:18307 Trafficking of AMPA receptors 17.s1 

REACTOME:18266 Axon guidance 10.s1, 38.s1 

REACTOME:20563 Activation of NMDA receptor upon 22.s1, 32.s1, 

glutamate binding and 33.s1, 35.s1 

postsynaptic events 



Abbreviations: ASD, autism spectrum disorder; GO, gene ontology; KEGG, 
Kyoto encyclopedia of genes and genomes. 



again highlight neurodevelopment as an important functional hub 
associated with ASD pathogenesis. Multiple cases were shown to 
have affected pathways related to neuron migration 
(GO:0001764), neurotrophin signaling (KEGG:04722) and axon 
guidance ( RE ACTOME: 18266). These analyses also identified 
affected pathways more directly linked to clinical phenotypes, 
such as social behavior (GO:0035176) and vocalization behavior 
(GO:0071625). 

Notably we found that there were a number of putative causal 
DNA variations associated with the neurexin trans-synaptic 
complex (NTSC) that has previously been associated with 



autism. There were two putative causal DNA variations found 
in neurexins (NRXNs), CNTNAP 3 and CNTNAP 5. We also found a 
number of variations among interacting molecules of the NTSC 
specifically those that interact with neuroligins (NLGN), SH3 and 
multiple ankyrin repeat domains (SHANKs) and NRXN. These 
include SNVs that occur in DLG4 (postsynaptic density protein 95), 
DLGAP2 (disks large-associated protein 2), LPHN2 (latrophilin-2), 
SPTAN1 (spectrin, alpha, non-erythrocytic 1), CASK (calcium/ 
calmodulin-dependent serine protein kinase), PDZD2 (PDZ domain 
containing 2), DDX24 (DEAD (Asp-Glu-Ala-Asp) box helicase 24), 
SIPA1L1 (signal-induced proliferation-associated 1 like 1) and 
NXPH3 (neurexophilin 3). We found SNVs also occur in other 
potential NRXN interacting molecules include LRRTM4 (leucine- 
rich repeat transmembrane neuronal 4), CNTN3 (contactin 3) and 
CNTN4 (contactin 4). Aside from synapse development, another 
striking association was 4/48 cases involved convergence of DNA 
variants in the L1CAM interaction pathway. The L1CAM interaction 
pathway consists of the L1 family of cell adhesion molecules, 
which has an important role in neuronal migration, axon guidance 
and synaptic formation. 51,52 Genetic variants in L1CAM have been 
associated with a wide range of neurodevelopmental and mental 
health disorders. 53,54 

Our results also show pathways involved in cellular assembly, 
circadian rhythm, immune response, protein catabolism/modifica- 
tion and various signal transductions are also associated with ASD 
(Supplementary Table 5), and may be involved in developmental 
regression, locomotion impairment, metabolic abnormalities and 
sleep disturbances 55 
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DISCUSSION 

We have previously shown that the AXAS model provides a 
hypothetical framework to analyze and profile the molecular basis 
of neurodevelopmental disorders such as ASD. 1 This a priori 
molecular network provides a 'biological assumption' for analysis 
of WES genetic screening data from Australian ASD families. 
Although heterogeneity of causal DNA variants is an inherent 
property of the human genome, 56 as well as in ASD, 3,14,57 we show 
that the AXAS model can successfully associate DNA variations 
with ASD from WES cohort data. This is because the AXAS network 
approach provides a means to recognize patterns of candidate 
genes that are putatively overrepresented in functional processes 
and pathways associated with ASD. Notably, we used this 
approach to search for neurofunctional pathways associated with 
ASD among individual cases and parents by investigating higher- 
order interaction of combinations of DNA variants (Figure 4). We 
confirm that putative causal variations often cluster in functional 
pathways and represent a molecular convergence of mostly 
heterozygous weak alleles that likely erode biological processes. 
Fine mapping of causal variants may therefore create plausible 
links between genotype and phenotype. 18,58 

We examined IVs comparing BAP and non-BAP parents to trace 
combinations of genetic determinants associated with ASD in the 
probands. Previous studies have found broader ASD-like beha- 
vioral phenotypes or clinical subtypes in parents, siblings or 
relatives of individuals with ASD. 22,59-61 Our results suggest that 
there is a very strong genetic association of IVs from BAP parents 
with ASD (Figure 3), confirming a direct relationship between 
subclinical behavioral and genetic phenotypes in parents whose 
children present with ASD. Given that IVs are subject to 
evolutionary selection and putatively segregate in a distribution 
of effect including a contribution to weak BAP phenotypes in 
parents, de novo DNA variations are not subject to selection. We 
found that DNVs detected in the probands also had a strong 
statistical association with ASD and, to a lesser extent, with 
X-linked intellectual disorder and attention deficit and hyper- 
activity disorder compared with controls (non-BAP parents; 
Figure 3). This is probably due to the random occurrence of DNVs 
in neurological genes that are comorbid between mental health 
disorders. 1 Therefore future genetic investigations need to 
estimate the statistical significance of IVs from BAP and non-BAP 
parents with DNVs, which can only be achieved through genetic 
screening of families. 

The WES approach has limitations as it only detects individual 
genetic variation based on SNPs and small indels associated with 
exonic regions of the genome. WES does not detect large copy 
number variations or genetic variants of regulatory sequences 
located in intergenic regions. Copy number variations are 
important genetic contributors that account for -10% of ASD, 
including associated syndromes. 55,62 We did, however, formally 
examine DNA variations that occur in noncoding regions 
associated with WES data (Figure 1), including variations in 
noncoding RNAs such as miRNAs. Mutations were detected in 
miRNA transcripts and these may have implications for miRNA 
precursor stability and targeting on a broad scale. 63 There were 
also many SNVs in miRNA target binding sites and these 
potentially have a more specific effect on miRNA targeting 
activity. 63,64 Future screening studies could more comprehensively 
include using high density-array genotyping for detection of copy 
number variations and SNPs associated with noncoding informa- 
tion, whereas whole genome sequencing remains an option. 

Although WES cannot provide a full-genome accounting of 
ASD, it does detect polygenic DNA variations associated with 
coding information that often cluster and disrupt neurodevelop- 
mental pathways. A closer examination of putative causal DNA 
variations associated with the L1CAM interaction pathway 
involved in axon guidance (Figure 4) highlights how combinations 



of gene variations could disrupt biological function. There is an 
overlap of putative causative variants between case 06.s1 
(Figure 4c) and 30.s1 (Figure 4d), including a variant of NFASC 
(neurofascin; chr1:204944474) that occurs in 14 families. Regard- 
ing the first-degree interactors of NFASC, we find that DCX 
(doublecortin) and NrCAM (neuronal cell adhesion molecule) have 
been previously associated with ASD 65,66 Case 06.s1 also contains 
a genetic variant of SPRK2 (serine/threonine protein kinase 2; 
chr7:1 04909268) that interacts with VAV2 (guanine nucleotide 
exchange factor VAV2) that is involved in the L1CAM pathway. In 
cases 6.s1 and 30.s1, there is a genetic variant of NPEPPS 
(puromycin-sensitive aminopeptidase; chrl 7:45669359) that is 
more commonly found in 33 families. NPEPPS has previously 
been shown to interact with RPS6KA3 (ribosomal protein S6 
kinase, polypeptide 3), and variants of NPEPPS have been 
associated with ASD and with epileptic encephalopathy 67 The 
more frequent occurrence of these NFASC and NPEPPS variants 
associated with the L1CAM pathway may therefore represent 
important genetic elements that predispose to ASD. 

In case 38.s1 (Figure 4e), there is a rare variant of DLG4 (disks 
large homolog 4; chrl 7:7097689) that also occurs in family 39. 
DLG4 is involved in the L1CAM pathway and has many first- 
degree interactors, including another rare variant of ITGA4 
(integrin, alpha 4; chr2:1 82347303). Like a number of other L1 

family genes, ITGA4 has also previously been associated with 
ASD 68-70 |n case 18s1 ( Figure 4f)/ we found a DNV 

(chr2:1 66231 244) of SCN2A (sodium channel, voltage-gated, type 
II, alpha subunit). SCN2A is a well-known candidate ASD gene 6,12 
which interacts with ANK3 (ankyrin 3). Ankyrins are important 
because they link voltage-gated sodium and potassium channels 
to spectrin, L1 and NrCAM. In addition, we find genetic variants of 
ERBB2IP (erbb2 interacting protein; chr5:653 17206) and MACF1 
(microtubule actin cross-linking factor 1; chrl: 39851427) that are 
also associated with the L1CAM pathway. Taken together, rare 
genetic variants and DNVs that occur in combination in probands, 
as detailed above, may therefore increase genetic liability 
associated with key processes involved in neuronal development 
and plasticity. 

In summary, the AXAS model is a tool that can be used to 
abstract a molecular basis of ASD, helping to engage genetic 
screening data with hypothesis-driven paradigms based on 
biological evidence. Our findings provide a means to identify 
the functional consequence of combinations of causal DNA 
variations associated with ASD. This study raises the intriguing 
prospect of how we might examine the unique pathogenesis of 
individual families with ASD and the potential application of 
targeted therapies and personalized medicine. 
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